Annisa Novtariany
Universitas Nusa Mandiri

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

Exploring feature selection techniques on Classification Algorithms for Predicting Type 2 Diabetes at Early Stage Mila Desi Anasanti; Khairunisa Hilyati; Annisa Novtariany
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 6 No 5 (2022): Oktober 2022
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v6i5.4419

Abstract

Predicting early Type 2 diabetes (T2D) is critical for improved care and better T2D outcomes. An accurate and efficient T2D prediction relies on unbiased relevant features. In this study, we searched for important features to predict T2D by integrating ML-based models for feature selection and classification from 520 individuals newly diagnosed with diabetes or who will develop it. We used standard machine learning classifications, such as logistic regression (LR), Gaussian naive Bayes (NB), decision tree (DT), random forest (RF), support vector machine (SVM) with linear basis function, and k-nearest neighbors (KNN). We set out to systematically explore the viability of main feature selection representing each different technique, such as a statistical filter method (F-score), an entropy-based filter method (mutual information), an ensemble-based filter method (random forest importance), and a stochastic optimization (simultaneous perturbation feature selection and ranking (SpFSR)). We used a stratified 10-fold cross-validation technique and assessed the performance of discrimination, calibration, and clinical utility. We attained the highest accuracy of 98% using RF with the full set of features (16 features), then used RF as a classifier wrapper to select the important features. We observed a combination of SpFSR and RF as the best model with a P-value above 0.05 (P-value = 0.26), statistically attaining the same accuracy as the full features. The study's findings support the efficiency and usefulness of the suggested method for choosing the most important features of diabetic data: polyuria, gender, polydipsia, age, itching, sudden weight loss, delayed healing, and alopecia.